Stupid Lucene Tricks: Search case-insensitive, Retrieve ca

Mark Leighton Fisher on 2010-06-17T10:58:39

Sometimes when you build an index in Lucene, you want to structure the index so that people can search without worrying about case (case-insensitive search), but you want the display to contain the original mixed-case data (case-sensitive display). The trick is to split each Lucene field into 2 versions:

  1. A case-insensitive field that is indexed but not stored (Lucene.Net.Documents.Field.Index.ANALYZED and Lucene.Net.Documents.Field.Store.NO).
  2. A case-sensitive field that is stored but not indexed, preferably with a field name similar to that of its case-insensitive cousin field like "Display_Title" and "Title" (Lucene.Net.Documents.Field.Index.NOT_ANALYZED and Lucene.Net.Documents.Field.Store.YES).

Storing only the case-sensitive version reduces the index storage requirement (I have seen around a 40% increase in index size with this trick as compared to both storing and indexing one field).